Skip to main content

View Post [edit]

Poster: hans.heiri Date: Jul 31, 2017 7:02am
Forum: faqs Subject: robots.txt no longer supported to exclude page from being crawled?

dear community members,

somebody from our legal department told us, that the "internet archive" won't longer support "robots.txt" to exclude websites from being crawled.

in the FAQ section, i've found the note that one has to send an info to info@archive.org to "have the page excluded from the wayback machine". but it says nothing about if the "robots.txt" still works or not.

does anyone have new information regarding this topic?

any help appreciated & thanks in advance,
hans

Reply [edit]

Poster: MeditateOrDie Date: Jul 31, 2017 11:42am
Forum: faqs Subject: Re: robots.txt no longer supported to exclude page from being crawled?

A properly configured robots.txt will still prevent crawling -
I'm not aware of any change to that policy.

Note: Not all site indexers/crawlers will obey robots.txt
so if you don't want something accessible, configure
your servers to use permissions or other tricks to
prevent unauthorized access to places which
you'd prefer to protect.

Reply [edit]

Poster: hans.heiri Date: Aug 1, 2017 3:20am
Forum: faqs Subject: Re: robots.txt no longer supported to exclude page from being crawled?

Thanks for your quick reply! i gonna share this with our legal guys.

Thanks again,
hans